Subject: Comments on the CD1.7 draft (1) - given id of 99-xxxx

Author: Robert Jones, email: 100621.553@compuserve.com

References:

1. Committee Draft 1.7 for the proposed revision of ISO 1989:1985, COBOL standard - PDF version. I use Adobe Acrobat Reader, version 3.

Comments:

1 Title page

There are various horizontal bars that seem extraneous. I don't have a hardcopy version, so this problem may not apply there.

2 Contents & Annex C.4.2 Recursive and initial programs, pages xiv & 726

This is identified separately in the table of contents (arguably not a bad idea in this particular case, though inconsistent), but its heading is used on pages from its start position in Annex 4 to the very end of Annex 4, even where it is not applicable.

3 Contents, page xiv

Annex F, shouldn't there be some way of indicating the appropriate letter of this grouping in a manner similar to that of Annex E. Perhaps all the preliminary headings for the annexes should commence with their primary letter, e.g. "A Communications facility" as is done with numbers for the numbered sections, e.g. "16 Standard classes".

4 Conformance, 3.1.6, Reserved words, page 4

Perhaps insert "shall" at the beginning of line 3, to be consistent with the rest of the sentence.

5 Definitions, 4.197, Floating-point numeric literal, page 17

I think it would be preferable to follow the general form used to describe a fixed-point numeric literal, for example by adding the following text to that already present:

"and is expressed as a literal comprising the significand and radix in that order with no intervening spaces. The significand is expressed in the same way as a fixed-point numeric literal. The radix is expressed by the upper or lower case letter "E", followed immediately by a plus or minus sign, followed immediately by another fixed-point numeric literal of from one to three numeric digits only.".

If this is considered too long-winded, then perhaps just the first partial sentence should be added.

6 Definitions, 4.216, Identifier, page 18

Consider whether this is too restrictive? - also see 8.4.2 Identifiers. Procedures should perhaps also be included, especially functions and methods, it is arguable that references to these latter are to the temporary data items that are implied by such references and conceivably the same could be said for called programs. However, the term identifier implies identification of any identifiable feature, resource, service or user defined element of the language rather than just the data. The term "data-identifier" would be a more accurate description for the current definition. However, it may be that even if a revision is considered desirable, it would be better left to the next standard.

7 Definitions, 4.314, Numeric Function, page 23

Insert a comma between "numeric" and "but".

8 Definitions, 4.xxx, Text-word

It might be beneficial to add this term, perhaps with the following description derived from "7.1.1.4 Text-words":

"A character string in source or library text that constitutes an element processed by the text manipulation statements COPY and REPLACE."

9 Definitions, 4.477, User-defined word, page 31

While true, I don't think it is particularly helpful, at least not on its own. Perhaps one should add something along the following lines:

"Such words are mainly used as identifiers for data items and procedures.".

or

"Such words are mainly identifiers used to name the elements and properties of elements of a program that the user has specified the characteristics of using the reserved words and other elements of the language.".

As a general point, I am not sure that level numbers fit very comfortably within the grouping of elements comprising user-defined words.

10 Reference Format, 6

I like the idea of concatenation of literals rather than continuation as expressed by Bryan Randall in 99-0423. I am not keen on his idea of substituting "AND" for "&". But, while I think that the "&" symbol is a reasonable way of identifying concatenation in a manner that is easily understood, one major benefit of using words of several characters rather than equivalent single character symbols with the same meaning, is that, when writing and amending programs, it is harder to create inadvertent additions, deletions, substitutions or transpositions that can still be syntactically valid. However, in the case of concatenation I think it is rather difficult to create errors of this nature with delimited literals, because the concatenation operator needs to be between closing and opening delimiters, with spaces for separators.

I think that continuation of COBOL words, literals and picture character-strings should be phased out of the standard as soon as possible, ideally as an exception to the usual procedure of making it obsolete first. Doing so would make the rules for reference format and the COPY and REPLACE statements much easier to develop and understand. If it is possible for a compiler to recognise and handle such a feature, it should be reasonably easy for automatic conversion programs to do the same when converting continued COBOL words, literals and picture character-strings. The committee could perhaps even provide a single-purpose program to do the conversion to make the sudden change more palatable to users, though it is arguable that it would be better as part of an automatic conversion program to deal with all problems at once. As a programmer I have never continued COBOL words, literals or picture character-strings from one line to another. I have only very rarely seen literals continued and never seen a COBOL word or picture character-string continued. When I needed to provide values for large data items, I always subdivided them into manageably sized FILLER data items first.

(There are some possible uses of the COPY and REPLACE statements that it might be difficult to convert, for example replacing large continued literals. In such cases, it would be necessary for the conversion to convert the currently matching replaced text for both COPY and REPLACE statements and the other source and library text so that the literals are broken and concatenated in the same way, in order that they can still be able to be matched. Probably, continued literals used as replacing operands should be flagged for user-intervention.)

11 Coded character sets and reference format

This mainly involves 6, Reference format and 8.1, Character sets.

I think that the note in 6, item 1c should be expanded to emphasise that the number of bytes used to represent a character may vary between alphanumeric and national coded character sets and that therefore even a fixed-form reference format line is variable in length in terms of the number of bytes needed to represent the characters contained. I seem to remember that one of the current year's COBOL papers mentioned this, though I haven't been able to find it again.

I think that this should also be listed in Annex D.2, Substantive changes not affecting existing programs, perhaps as an item entitled "Fixed-form reference format line length".

I don't think that the standard specifies that either an alphanumeric or a national character should occupy an integral number of bytes, I think that a clear statement one way or the other is highly desirable in "8.1.1, Computer's coded character set". An amendment could also be made to the "USAGE clause, 13.16.61.3, general rule 8, fourth line" to replace "multiple" by "integral multiple". "USAGE clause 13.16.61.3 general rule 7" could be amended to state that each character shall be represented by an integral number of bytes. "Annex D.1, Substantive changes potentially affecting existing programs, item 7, Size of characters for USAGE DISPLAY" states that "The size of a character represented in USAGE DISPLAY is defined to be the same as the size of a byte in the architecture of the computer.". The USAGE clause doesn't make such a statement, see "13.16.61.3 general rule 7".

12 Reference format 6, page 36

Item 1b, perhaps replace "reference format" by "both reference formats" or by "both fixed-form and free-form reference formats". On the other hand one could argue that item 1b is superfluous as its contents are effectively stated in the third sentence of the introductory paragraph.

13 Reference format 6, page 36

Item 1d, perhaps add the word "the" after the first word "For".

14 Reference format 6.1.2, Floating indicators, page 37

Literal continuation indicator, second line, "symbol" is misspelt.

15 Reference format 6, continuation, generally

I think that the terminology for the continuation of lines is confusing. The first paragraph of "6.2.4 Continuation of lines" states the general case where no continuation markers are used, but the second paragraph proceeds to qualify this by defining special meanings for the terms continuation lines and continued lines and, arguably therefore, implying that the wider sense of continuation without continuation indicators does not apply to the first paragraph. The first and second paragraphs of "6.3.1 Continuation of lines" are similarly confusing. Maybe adding the text "Additionally," to the beginning of both second paragraphs would reduce the chance of confusion. Also, it may be desirable to devise suitable terminology to identify continuation and continued lines with and without continuation markers.

My comment in item 7 of my earlier paper 99-0519 suffered to some extent from my misunderstanding of these paragraphs.

Item 1d, perhaps add the sentence "Also each line that is not a continued line is treated as if it were followed by a separator space." This would probably be a better solution to the problem identified in item 7 of my earlier paper 99-0519. Arguably, the last paragraph of 6.2.4, Continuation of lines, covers the position for fixed-form reference format, though I don't think that it is the best placement, as people reading the standard would tend not to look at the rules for continuing lines to find the appropriate rules for non-continued lines.

Rules that require a preceding or following space would I think be improved by specifying the requirement as "a preceding real or assumed space" or "a following real or assumed space", e.g.

"Reference format, 6.2.6.1, Comment lines" - see below,

"COPY statement, 7.1.2.2, Syntax rule 2",

"REPLACE statement, 7.1.3.2, Syntax rule 2",

"8.3.2 Separators, generally", maybe an introductory note referring to the use of assumed spaces would be helpful, perhaps with a suitable reference to 6 Reference format.

"8.3.2 Separators, item 5, para 6", commencing "The opening delimiter shall be immediately preceded by a space ..." and also acknowledging that the last non-blank character of a non-continued line is presumed to be a space,

"8.3.2, item 6 for pseudo-text" similarly,

"8.3.1 Character strings", where a character string commences in column one of a line, or where it terminates in the last character position of a line that is not continued.

16 Reference format 6.2.6.1, Comment lines, page 40

Perhaps insert ", which is preceded only by real or assumed spaces" at the end of the first sentence.

17 Reference format 6.2.7, Debugging lines, page 40

Second line, perhaps replace "any place" by "anywhere".

18 Reference format 6.3.3, Comments, page 41

Third para, I thought that what an implementor puts in a source listing was implementor dependent. See 7.2.12, LISTING directive and B.1 Implementor-defined language element list, item 115, "LISTING and PAGE directives (whether and when the compiler produces a listing)".

19 Reference format 6.3.3.1, Comment lines, page 41

Maybe insert "non-blank" before "character-string".

20 Reference format 6.3.3.2, Inline comments, page 41

Maybe insert "non-blank" before "character-strings".

21 Reference format 6.4, Logical conversion, page 42

Review interaction between items 7 and 9.

Maybe add the text "that does not follow a line continued with a floating literal continuation indicator" to the end of the first line of item 7.

22 Source text manipulation, 7.1.1.3, Pseudo-text, page 45

I am not sure how pseudo-text is continued. Is pseudo-text everything between the opening and closing delimiters, irrespective of how many lines are involved? If so, then rules to that effect would be in order, possibly in reference format with the other continuation rules. Is it allowable for either pseudo-text delimiter to appear on its own on a beginning or terminating line, or should there be rules similar to those of the continuation of alphanumeric, national and boolean literals, but without continuation indicators? As I understand it, any real or assumed spaces immediately following the opening delimiter would be ignored as would those immediately preceding the closing delimiter. I think that it would be undesirable to allow the two characters comprising the pseudo-text delimiter to be split for continuation and that rules similar to those for literal delimiters should apply.

It might be considered that some of this is self-evident, but it took me quite a while to determine this from the various scattered rules of reference format and the COPY and REPLACE statements.

23 Source text manipulation, 7.1.1.4, Text words, page 46

Item 1, third line, split "ofcontext" into separate words.

24 COPY Statement, 7.1.2.1, General format, page 47

Are "text-name", "library-name", "word", "pseudo-text", "text" and "partial-word" meta-terms or should they be defined or listed in "8.3.1.1.1 User defined words"? Should "word" be "COBOL word"?

25 COPY Statement, 7.1.2.1, General format, page 47

I think there is a case for making the use of "literal-1", "literal-2", "word-n" and "text-n" obsolete. I think it would be fairly easy for an automatic conversion program to make the change to existing programs.

26 COPY Statement, 7.1.2.2, Syntax rules, page 47

Rule 11, the two instances of the text "qualified-data-name-with-subscripts, reference modification", implies that reference modification is an identifier. Perhaps it should be rephrased as "qualified-data-name-with-or-without-subscripts-and-with-or-without-reference-modification".

Line two, perhaps there should be an additional sentence similar to that commencing with "If subscripting ..." to deal with reference modification.

27 COPY Statement, 7.1.2.2, Syntax rules, page 48

Rule 12, replace "initiator" by "indicator".

28 COPY Statement, 7.1.2.2, Syntax rules, page 48

Rule 13, Is the maximum length really 322 characters, if so why? It was the same in COBOL 85.

29 COPY Statement, 7.1.2.2, Syntax rules, page 48

(I now think that the only reason sole commas and semicolons are disallowed is because of their use as separators in the replacement process.)

(I still can't make up my mind whether the following is worth considering.)

Rule 14, I think there is a case for also excluding a sole period as well as sole separator commas and semicolons. After all, a compiler would not be able to make much sense of the result. A single separator space could also be reasonably excluded, as perhaps could any number of spaces with no other characters. Is the term separator needed in the rule? Arguably single spaces, parentheses, quotes and apostrophes are already excluded from having any effect by the specification of "7.1.1.4 Text-words", though they could still be present.

Consider other single characters that should not be allowed, e.g. hyphens, ampersands, quotes, apostrophes, plus and minus signs, relational operators. Should any single characters be allowed? Maybe only single characters not in the COBOL character repertoire should be allowed. Should character strings representing single reserved or context sensitive words be allowed (there may be good cases where such words should be allowed, though I haven't yet thought of any)? Are there other multiple groupings of characters that should not be allowed, for example two colons, two asterisks, and the relational operators ">=" and "<="?

There is presumably a limit on how much protection against misuse that the standard should provide. Also, it may be that some replacing operations need to be able to change certain items on a limited scale that would not be acceptable if done globally.

30 COPY Statement, 7.1.2.3, General rules, page 48

Rule 7, when using literal-3 and literal-4 are the enclosing literal delimiters part of the text to be replaced and substituted. I think that a statement to specify this one way or the other is highly desirable. Arguably, "7.1.1.4 Text-words" specifies that the opening and closing delimiters are included, though I still think that rule 7 would be clearer for saying so explicitly when literals are substituted in this way for pseudo-text.

31 COPY Statement, 7.1.2.3, General rules, page 48

Rule 8c item 1, insert "resultant" after "Each" at the front of the second sentence to make it clear that this applies to a sequence of presumed spaces as well.

32 COPY Statement, 7.1.2.3, General rules, page 50

Rules 11 and 14, seem to conflict with respect to the treatment of comments and blank lines. Would it be better to explicitly specify both "inline comments" and "comment lines" rather than just "comments"?

33 REPLACE Statement, 7.1.3, Generally

Some of the above comments on the COPY statement also apply to the REPLACE statement.

34 Character strings, 8.3.1.2.2, Numeric literals, page 88

It would be desirable to be able to use grouping separators in numeric literals. This would make programs more readable and would reduce the likelihood of errors by programmers when defining and amending large numeric literals.

35 General, Floating-point data-items, mainly USAGE clause 13.16.61.3

Consider whether it should be explicitly specified that each such data item should occupy a whole number of bytes and start at a byte boundary. I realise that internal representation, alignment and occupancy is implementor defined, and would agree with that up to a point, but wonder whether it is intended to allow such data items to be treated like usage BIT. See "USAGE clause 13.16.61.3, General rules". Similar comments apply to usage BINARY, COMPUTATIONAL, INDEX, PACKED-DECIMAL, BINARY-xxx, OBJECT-REFERENCE, POINTER and PROGRAM-POINTER. BIT and NATIONAL are obvious justifiable exceptions, if national characters can be represented by a non-integral multiple of bytes that is one or greater.

Does it matter?, but, if not, shouldn't there be similar rules to those in "8.5.1.6.1.3 Alignment of data items of usage bit", which also could be similarly applied to usage national.

Also see the provisions of "8.5.1.6 Standard data alignment rules" and "8.5.1.7 Item alignment for increased efficiency".

36 Culturally-specific, culturally-adaptable and multilingual applications, C.13.2.4, Locale-based case classification of letters, page 765

Sixth para or separate text grouping, replace "picture string" by "picture character-string" for consistency with the rest of the standard.